{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "A8o2XLKIz84Y"
      },
      "source": [
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/v1/claude-3-agent.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/v1/claude-3-agent.ipynb)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "1eOnr6z_zLoc",
        "outputId": "53f4214a-c00b-4dea-d1fa-38d0d6693fe9"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Python 3.10.12\n"
          ]
        }
      ],
      "source": [
        "!python --version"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qQG4iSxVxw8f"
      },
      "source": [
        "# Claude 3 Opus Conversation Agent with RAG"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ov6TCS7bx1oI"
      },
      "source": [
        "In this example, we'll be building a RAG agent with LangChain v1. We will use Claude 3 Opus for our LLM, Voyage AI embed model for knowledge embeddings, and Pinecone to power our knowledge retrieval.\n",
        "\n",
        "To begin, let's install the prerequisites:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "zshhLDrgbFKk",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "ed49398c-dc7c-4c9c-e6c3-1a1910d52006"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m848.6/848.6 kB\u001b[0m \u001b[31m4.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m211.0/211.0 kB\u001b[0m \u001b[31m18.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m507.1/507.1 kB\u001b[0m \u001b[31m30.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m75.6/75.6 kB\u001b[0m \u001b[31m8.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m115.3/115.3 kB\u001b[0m \u001b[31m13.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m15.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m77.8/77.8 kB\u001b[0m \u001b[31m9.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m58.3/58.3 kB\u001b[0m \u001b[31m6.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u2501\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m13.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h"
          ]
        }
      ],
      "source": [
        "!pip install -qU \\\n",
        "    langchain==0.1.11 \\\n",
        "    langchain-core==0.1.30 \\\n",
        "    langchain-community==0.0.27 \\\n",
        "    langchain-anthropic==0.1.4 \\\n",
        "    langchainhub==0.1.15 \\\n",
        "    anthropic==0.19.1 \\\n",
        "    voyageai==0.2.1 \\\n",
        "    pinecone-client==3.1.0 \\\n",
        "    datasets==2.16.1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bpKfZkUYzQhB"
      },
      "source": [
        "## Finding Knowledge"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JDTQoxcNzUa8"
      },
      "source": [
        "The first thing we need for an agent using RAG is somewhere we want to pull knowledge from. We will use v2 of the AI ArXiv dataset, available on Hugging Face Datasets at [`jamescalam/ai-arxiv2-chunks`](https://huggingface.co/datasets/jamescalam/ai-arxiv2-chunks).\n",
        "\n",
        "_Note: we're using the prechunked dataset. For the raw version see [`jamescalam/ai-arxiv2`](https://huggingface.co/datasets/jamescalam/ai-arxiv2)._"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 297,
          "referenced_widgets": [
            "4d5c6f53e82948b18b45877e2e30fd78",
            "7036b643c9264b09b0f93cb6a1a8b01f",
            "fb92dd97e68643099b0d0f1b4983f5f1",
            "91c3b00e77444bb79d9f73314800471e",
            "3e1d39ab6d3a4cf5827c9b3e1d4f0df1",
            "f58fb2da52bc43a985449910daaac230",
            "449afd685fd94354b64aa43a4763ab1d",
            "43a3c63554894d268e3bf747f7e0289d",
            "d902beef883e4c65bd2d6cb161423bae",
            "ce5735788cd14803abac28fa2c42d168",
            "a8928cd888a648c9b7d52739f38c5bda",
            "0168cb9e015b4e119fcdc7beca2e21a3",
            "a0e93df2955b44208479a4b416cd1085",
            "6f58f8d6374b45b4b6a0d9fcaa8083e7",
            "8bde11d3d65c45dba3baf1c02450eef1",
            "b19af932a7f24dc799b683136d175107",
            "e2819a7a94974a4da9b6bde754fdbc50",
            "808dd94ecaff43e0848bcfead3cf0d75",
            "4383833994854f88aa12ba0edb673d47",
            "ee2672db2047468fb49482e14127dcd3",
            "6999dc582b4c4ba68fdbfe404b1c4dfe",
            "593274c4088f4d9aa2becb6b1c6e495b"
          ]
        },
        "id": "U9gpYFnzbFKm",
        "outputId": "c7005c0a-b9a2-4c7c-f93b-e92081f7e3d6"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stderr",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: \n",
            "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
            "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
            "You will be able to reuse this secret in all of your notebooks.\n",
            "Please note that authentication is recommended but still optional to access public models or datasets.\n",
            "  warnings.warn(\n"
          ]
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Downloading data:   0%|          | 0.00/766M [00:00<?, ?B/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "4d5c6f53e82948b18b45877e2e30fd78"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "display_data",
          "data": {
            "text/plain": [
              "Generating train split: 0 examples [00:00, ? examples/s]"
            ],
            "application/vnd.jupyter.widget-view+json": {
              "version_major": 2,
              "version_minor": 0,
              "model_id": "0168cb9e015b4e119fcdc7beca2e21a3"
            }
          },
          "metadata": {}
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "Dataset({\n",
              "    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],\n",
              "    num_rows: 20000\n",
              "})"
            ]
          },
          "metadata": {},
          "execution_count": 4
        }
      ],
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "dataset = load_dataset(\"jamescalam/ai-arxiv2-chunks\", split=\"train[:20000]\")\n",
        "dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "_bP7ZW-ybFKm",
        "outputId": "491e8001-df44-465c-9979-2e2a30de9d5f"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'doi': '2401.09350',\n",
              " 'chunk-id': 1,\n",
              " 'chunk': 'These neural networks and their training algorithms may be complex, and the scope of their impact broad and wide, but nonetheless they are simply functions in a high-dimensional space. A trained neural network takes a vector as input, crunches and transforms it in various ways, and produces another vector, often in some other space. An image may thereby be turned into a vector, a song into a sequence of vectors, and a social network as a structured collection of vectors. It seems as though much of human knowledge, or at least what is expressed as text, audio, image, and video, has a vector representation in one form or another.\\nIt should be noted that representing data as vectors is not unique to neural networks and deep learning. In fact, long before learnt vector representations of pieces of data\u00e2\\x80\\x94what is commonly known as \u00e2\\x80\\x9cembeddings\u00e2\\x80\\x9d\u00e2\\x80\\x94came along, data was often encoded as hand-crafted feature vectors. Each feature quanti- fied into continuous or discrete values some facet of the data that was deemed relevant to a particular task (such as classification or regression). Vectors of that form, too, reflect our understanding of a real-world object or concept.',\n",
              " 'id': '2401.09350#1',\n",
              " 'title': 'Foundations of Vector Retrieval',\n",
              " 'summary': 'Vectors are universal mathematical objects that can represent text, images,\\nspeech, or a mix of these data modalities. That happens regardless of whether\\ndata is represented by hand-crafted features or learnt embeddings. Collect a\\nlarge enough quantity of such vectors and the question of retrieval becomes\\nurgently relevant: Finding vectors that are more similar to a query vector.\\nThis monograph is concerned with the question above and covers fundamental\\nconcepts along with advanced data structures and algorithms for vector\\nretrieval. In doing so, it recaps this fascinating topic and lowers barriers of\\nentry into this rich area of research.',\n",
              " 'source': 'http://arxiv.org/pdf/2401.09350',\n",
              " 'authors': 'Sebastian Bruch',\n",
              " 'categories': 'cs.DS, cs.IR',\n",
              " 'comment': None,\n",
              " 'journal_ref': None,\n",
              " 'primary_category': 'cs.DS',\n",
              " 'published': '20240117',\n",
              " 'updated': '20240117',\n",
              " 'references': []}"
            ]
          },
          "metadata": {},
          "execution_count": 5
        }
      ],
      "source": [
        "dataset[1]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VX6NdQhgbFKn"
      },
      "source": [
        "## Building the Knowledge Base"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MDCbqQl_bFKn"
      },
      "source": [
        "To build our knowledge base we need _two things_:\n",
        "\n",
        "1. Embeddings, for this we will use `VoyageEmbeddings` using Voyage AI's embedding models, which do need an [API key](https://dash.voyageai.com/api-keys).\n",
        "2. A vector database, where we store our embeddings and query them. We use Pinecone which again requires a [free API key](https://app.pinecone.io).\n",
        "\n",
        "First we initialize our connection to Voyage AI and define an `embed` object for embeddings:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "PzBQ_iE6bFKn"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "from getpass import getpass\n",
        "\n",
        "voyage_key = os.getenv(\"VOYAGE_API_KEY\") or getpass(\"Voyage API key: \")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "wkw0KyLRbFKo"
      },
      "outputs": [],
      "source": [
        "from langchain_community.embeddings import VoyageEmbeddings\n",
        "\n",
        "embed = VoyageEmbeddings(\n",
        "    voyage_api_key=voyage_key, model=\"voyage-2\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LhDzfsczbFKo"
      },
      "source": [
        "Then we initialize our connection to Pinecone:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "j0N7EcJibFKo"
      },
      "outputs": [],
      "source": [
        "from pinecone import Pinecone\n",
        "\n",
        "# initialize connection to pinecone (get API key at app.pinecone.io)\n",
        "api_key = os.getenv(\"PINECONE_API_KEY\") or getpass(\"Pinecone API key: \")\n",
        "\n",
        "# configure client\n",
        "pc = Pinecone(api_key=api_key)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "g65RLGIpbFKo"
      },
      "source": [
        "Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "id": "8stIZYKdbFKo"
      },
      "outputs": [],
      "source": [
        "from pinecone import ServerlessSpec\n",
        "\n",
        "spec = ServerlessSpec(\n",
        "    cloud=\"aws\", region=\"us-west-2\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-8Ep3743bFKo"
      },
      "source": [
        "Before creating an index, we need the dimensionality of our Voyage AI embedding model, which we can find easily by creating an embedding and checking the length:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "DwMhLWLDbFKo",
        "outputId": "f8731679-583c-497c-93d9-b13ea6013c83"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "1024"
            ]
          },
          "metadata": {},
          "execution_count": 10
        }
      ],
      "source": [
        "vec = embed.embed_documents([\"ello\"])\n",
        "len(vec[0])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G3X7nZIabFKp"
      },
      "source": [
        "Now we create the index using our embedding dimensionality, and a metric also compatible with the model (this can be either cosine or dotproduct). We also pass our spec to index initialization."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "E6Bl7xTJbFKp",
        "outputId": "03e0b822-b150-4a6b-b83d-1cee1e4fced4"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "{'dimension': 1024,\n",
              " 'index_fullness': 0.0,\n",
              " 'namespaces': {'': {'vector_count': 20000}},\n",
              " 'total_vector_count': 20000}"
            ]
          },
          "metadata": {},
          "execution_count": 11
        }
      ],
      "source": [
        "import time\n",
        "\n",
        "index_name = \"claude-3-rag\"\n",
        "\n",
        "# check if index already exists (it shouldn't if this is first time)\n",
        "if index_name not in pc.list_indexes().names():\n",
        "    # if does not exist, create index\n",
        "    pc.create_index(\n",
        "        index_name,\n",
        "        dimension=len(vec[0]),  # dimensionality of voyage model\n",
        "        metric='dotproduct',\n",
        "        spec=spec\n",
        "    )\n",
        "    # wait for index to be initialized\n",
        "    while not pc.describe_index(index_name).status['ready']:\n",
        "        time.sleep(1)\n",
        "\n",
        "# connect to index\n",
        "index = pc.Index(index_name)\n",
        "time.sleep(1)\n",
        "# view index stats\n",
        "index.describe_index_stats()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6ZUn2lu7bFKp"
      },
      "source": [
        "### Populating our Index"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PeVD6d0sbFKp"
      },
      "source": [
        "Now our knowledge base is ready to be populated with our data. We will use the `embed` helper function to embed our documents and then add them to our index.\n",
        "\n",
        "We will also include metadata from each record."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 49,
          "referenced_widgets": [
            "78b688cc78264ad1a2f05d339fe498fe",
            "f8796aebc76a42ed80d6cf9c234b9e14",
            "93fd5108528a499cb5ae4444f0b2f770",
            "89683fb7a1a44606abc2ded05fbb3aa3",
            "f202dce30c1d4672ad357db9b6dbe9b8",
            "009627caf7fa4c51b61b6e2087545324",
            "3ff8bcf03c704243b7fd173b3cdd0d87",
            "cc0685f737fa4e15b7796e5f3fe3df8d",
            "f3cfb3b02d7a4e44b00e24aa2e7f9ff5",
            "1022918a023b4aa8a3714d9eab2978b1",
            "e6ed1f5a6e5b4c508fa3b69d9219057d"
          ]
        },
        "id": "hb00VSTqbFKp",
        "outputId": "f13e0707-2b9b-4966-df62-9e457765ebf8"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "78b688cc78264ad1a2f05d339fe498fe",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "  0%|          | 0/200 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "from tqdm.auto import tqdm\n",
        "\n",
        "# easier to work with dataset as pandas dataframe\n",
        "data = dataset.to_pandas()\n",
        "\n",
        "batch_size = 100\n",
        "\n",
        "for i in tqdm(range(0, len(data), batch_size)):\n",
        "    i_end = min(len(data), i+batch_size)\n",
        "    # get batch of data\n",
        "    batch = data.iloc[i:i_end]\n",
        "    # generate unique ids for each chunk\n",
        "    ids = [f\"{x['doi']}-{x['chunk-id']}\" for i, x in batch.iterrows()]\n",
        "    # get text to embed\n",
        "    texts = [x['chunk'] for _, x in batch.iterrows()]\n",
        "    # embed text\n",
        "    embeds = embed.embed_documents(texts)\n",
        "    # get metadata to store in Pinecone\n",
        "    metadata = [\n",
        "        {'text': x['chunk'],\n",
        "         'source': x['source'],\n",
        "         'title': x['title']} for i, x in batch.iterrows()\n",
        "    ]\n",
        "    # add to Pinecone\n",
        "    index.upsert(vectors=zip(ids, embeds, metadata))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "z6VVT3X_EMDO"
      },
      "source": [
        "Create a tool for our agent to use when searching for ArXiv papers:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "id": "X9J5jHKcEQz6"
      },
      "outputs": [],
      "source": [
        "from langchain.agents import tool\n",
        "\n",
        "@tool\n",
        "def arxiv_search(query: str) -> str:\n",
        "    \"\"\"Use this tool when answering questions about AI, machine learning, data\n",
        "    science, or other technical questions that may be answered using arXiv\n",
        "    papers.\n",
        "    \"\"\"\n",
        "    # create query vector\n",
        "    xq = embed.embed_query(query)\n",
        "    # perform search\n",
        "    out = index.query(vector=xq, top_k=5, include_metadata=True)\n",
        "    # reformat results into string\n",
        "    results_str = \"\\n\\n\".join(\n",
        "        [x[\"metadata\"][\"text\"] for x in out[\"matches\"]]\n",
        "    )\n",
        "    return results_str\n",
        "\n",
        "tools = [arxiv_search]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uN7d_4r-JMPW"
      },
      "source": [
        "When this tool is used by our agent it will execute it like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "eq4H-2RpI1U3",
        "outputId": "ca36c8cb-5fd1-47ed-cc67-f43fd9833af0"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Model Llama 2 Code Llama Code Llama - Python Size FIM LCFT Python CPP Java PHP TypeScript C# Bash Average 7B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 70B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0093 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0093 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0093 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0093 34B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 14.3% 6.8% 10.8% 9.9% 19.9% 13.7% 15.8% 13.0% 24.2% 23.6% 22.2% 19.9% 27.3% 30.4% 31.6% 34.2% 12.6% 13.2% 21.4% 15.1% 6.3% 3.2% 8.3% 9.5% 3.2% 12.6% 17.1% 3.8% 18.9% 25.9% 8.9% 24.8% \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0093 \u00e2\u009c\u0093 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0093 \u00e2\u009c\u0093 \u00e2\u009c\u0097 \u00e2\u009c\u0093 37.3% 31.1% 36.1% 30.4% 29.2% 29.8% 38.0%\n",
            "\n",
            "Ethical Considerations and Limitations (Section 5.2) Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2\u00e2\u0080\u0099s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their speci\u00ef\u00ac\u0081c applications of the model. Please see the Responsible Use Guide available available at https://ai.meta.com/llama/responsible-user-guide\n",
            "Table 52: Model card for Llama 2.\n",
            "77\n",
            "\n",
            "2\n",
            "Cove Liama Long context (7B =, 13B =, 34B) + fine-tuning ; Lrama 2 Code training 20B oes Cope Liama - Instruct Foundation models \u00e2\u0080\u0094> nfilling code training = eee.\u00e2\u0080\u009d (7B =, 13B =, 34B) \u00e2\u0080\u0094 5B (7B, 13B, 348) 5008 Python code Long context Cove Liama - PyrHon (7B, 13B, 34B) > training \u00c2\u00bb Fine-tuning > 1008 208\n",
            "Figure 2: The Code Llama specialization pipeline. The different stages of fine-tuning annotated with the number of tokens seen during training. Infilling-capable models are marked with the \u00e2\u0087\u0084 symbol.\n",
            "# 2 Code Llama: Specializing Llama 2 for code\n",
            "# 2.1 The Code Llama models family\n",
            "\n",
            "# 2 Code Llama: Specializing Llama 2 for code\n",
            "# 2.1 The Code Llama models family\n",
            "Code Llama. The Code Llama models constitute foundation models for code generation. They come in four model sizes: 7B, 13B, 34B and 70B parameters. The 7B, 13B and 70B models are trained using an infilling objective (Section 2.3), and are appropriate to be used in an IDE to complete code in the middle of a file, for example. The 34B model was trained without the infilling objective. All Code Llama models are initialized with Llama 2 model weights and trained on 500B tokens from a code-heavy dataset (see Section 2.2 for more details), except Code Llama 70B which was trained on 1T tokens. They are all fine-tuned to handle long contexts as detailed in Section 2.4.\n",
            "\n",
            "0.52 0.57 0.19 0.30 Llama 1 7B 13B 33B 65B 0.27 0.24 0.23 0.25 0.26 0.24 0.26 0.26 0.34 0.31 0.34 0.34 0.54 0.52 0.50 0.46 0.36 0.37 0.36 0.36 0.39 0.37 0.35 0.40 0.26 0.23 0.24 0.25 0.28 0.28 0.33 0.32 0.33 0.31 0.34 0.32 0.45 0.50 0.49 0.48 0.33 0.27 0.31 0.31 0.17 0.10 0.12 0.11 0.24 0.24 0.23 0.25 0.31 0.27 0.30 0.30 0.44 0.41 0.41 0.43 0.57 0.55 0.60 0.60 0.39 0.34 0.28 0.39 Llama 2 7B 13B 34B 70B 0.28 0.24 0.27 0.31 0.25 0.25 0.24 0.29 0.29 0.35 0.33 0.35 0.50 0.50 0.56 0.51 0.36 0.41 0.41\n"
          ]
        }
      ],
      "source": [
        "print(\n",
        "    arxiv_search.run(tool_input={\"query\": \"can you tell me about llama 2?\"})\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XUvJOqrNhYIh"
      },
      "source": [
        "## Defining XML Agent"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "s45dwd78hbvk"
      },
      "source": [
        "The XML agent is built primarily to support Anthropic models. Anthropic models have been trained to use XML tags like `<input>{some input}</input` or when using a tool they use:\n",
        "\n",
        "```\n",
        "<tool>{tool name}</tool>\n",
        "<tool_input>{tool input}</tool_input>\n",
        "```\n",
        "\n",
        "This is much different to the format produced by typical ReAct agents, which is not as well supported by Anthropic models.\n",
        "\n",
        "To create an XML agent we need a `prompt`, `llm`, and list of `tools`. We can download a prebuilt prompt for conversational XML agents from LangChain hub."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ntuT7UuXeMz0",
        "outputId": "c47f0a98-c71f-488f-fc55-9e1ea7f643b4"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "ChatPromptTemplate(input_variables=['agent_scratchpad', 'input', 'tools'], partial_variables={'chat_history': ''}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['agent_scratchpad', 'chat_history', 'input', 'tools'], template=\"You are a helpful assistant. Help the user answer any questions.\\n\\nYou have access to the following tools:\\n\\n{tools}\\n\\nIn order to use a tool, you can use <tool></tool> and <tool_input></tool_input> tags. You will then get back a response in the form <observation></observation>\\nFor example, if you have a tool called 'search' that could run a google search, in order to search for the weather in SF you would respond:\\n\\n<tool>search</tool><tool_input>weather in SF</tool_input>\\n<observation>64 degrees</observation>\\n\\nWhen you are done, respond with a final answer between <final_answer></final_answer>. For example:\\n\\n<final_answer>The weather in SF is 64 degrees</final_answer>\\n\\nBegin!\\n\\nPrevious Conversation:\\n{chat_history}\\n\\nQuestion: {input}\\n{agent_scratchpad}\"))])"
            ]
          },
          "metadata": {},
          "execution_count": 15
        }
      ],
      "source": [
        "from langchain import hub\n",
        "\n",
        "prompt = hub.pull(\"hwchase17/xml-agent-convo\")\n",
        "prompt"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rfdcKCdwi0SL"
      },
      "source": [
        "We can see the XML format being used throughout the prompt when explaining to the LLM how it should use tools.\n",
        "\n",
        "Next we initialize our connection to Anthropic, for this we need an [Anthropic API key](https://console.anthropic.com/)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "id": "kDHuU2uOdW91"
      },
      "outputs": [],
      "source": [
        "from langchain_anthropic import ChatAnthropic\n",
        "\n",
        "anthropic_api_key = os.getenv(\"ANTHROPIC_API_KEY\") or getpass(\"Anthropic API key: \")\n",
        "\n",
        "# chat completion llm\n",
        "llm = ChatAnthropic(\n",
        "    anthropic_api_key=anthropic_api_key,\n",
        "    model_name=\"claude-3-opus-20240229\",  # change \"opus\" -> \"sonnet\" for speed\n",
        "    temperature=0.0\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "g33Nt-xijPKG"
      },
      "source": [
        "When the agent is run we will provide it with a single `input` \u2014 this is the input text from a user. However, within the agent logic an *agent_scratchpad* object will be passed too, which will include tool information. To feed this information into our LLM we will need to transform it into the XML format described above, we define the `convert_intermediate_steps` function to handle that."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "id": "TMMBgMBlIJoq"
      },
      "outputs": [],
      "source": [
        "def convert_intermediate_steps(intermediate_steps):\n",
        "    log = \"\"\n",
        "    for action, observation in intermediate_steps:\n",
        "        log += (\n",
        "            f\"<tool>{action.tool}</tool><tool_input>{action.tool_input}\"\n",
        "            f\"</tool_input><observation>{observation}</observation>\"\n",
        "        )\n",
        "    return log"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T5_PQWVckAOi"
      },
      "source": [
        "We must also parse the tools into a string containing `tool_name: tool_description` \u2014 we handle that with the `convert_tools` function."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "id": "qxbrF5a4j9il"
      },
      "outputs": [],
      "source": [
        "def convert_tools(tools):\n",
        "    return \"\\n\".join([f\"{tool.name}: {tool.description}\" for tool in tools])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SCVI2dyUIRg6"
      },
      "source": [
        "With everything ready we can go ahead and initialize our agent object using [**L**ang**C**hain **E**xpression **L**anguage (LCEL)](https://www.pinecone.io/learn/series/langchain/langchain-expression-language/). We add instructions for when the LLM should _stop_ generating with `llm.bind(stop=[...])` and finally we parse the output from the agent using an `XMLAgentOutputParser` object."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {
        "id": "Z3yhTDmEIU4n"
      },
      "outputs": [],
      "source": [
        "from langchain.agents.output_parsers import XMLAgentOutputParser\n",
        "\n",
        "agent = (\n",
        "    {\n",
        "        \"input\": lambda x: x[\"input\"],\n",
        "        # without \"chat_history\", tool usage has no context of prev interactions\n",
        "        \"chat_history\": lambda x: x[\"chat_history\"],\n",
        "        \"agent_scratchpad\": lambda x: convert_intermediate_steps(\n",
        "            x[\"intermediate_steps\"]\n",
        "        ),\n",
        "    }\n",
        "    | prompt.partial(tools=convert_tools(tools))\n",
        "    | llm.bind(stop=[\"</tool_input>\", \"</final_answer>\"])\n",
        "    | XMLAgentOutputParser()\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MG2_hL4hkudq"
      },
      "source": [
        "With our `agent` object initialized we pass it to an `AgentExecutor` object alongside our original `tools` list:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "id": "YHW_K3WOIsXw"
      },
      "outputs": [],
      "source": [
        "from langchain.agents import AgentExecutor\n",
        "\n",
        "agent_executor = AgentExecutor(\n",
        "    agent=agent, tools=tools, verbose=True\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QRCtHauRlkLc"
      },
      "source": [
        "Now we can use the agent via the `invoke` method:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 25,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Y_Aqp20qloj7",
        "outputId": "c4fe06f3-a147-4274-becd-49c9992a8cd6"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "\n",
            "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
            "\u001b[32;1m\u001b[1;3m<tool>arxiv_search</tool>\n",
            "<tool_input>llama 2\u001b[0m\u001b[36;1m\u001b[1;3mModel Llama 2 Code Llama Code Llama - Python Size FIM LCFT Python CPP Java PHP TypeScript C# Bash Average 7B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 70B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0093 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0093 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0093 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0093 34B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 14.3% 6.8% 10.8% 9.9% 19.9% 13.7% 15.8% 13.0% 24.2% 23.6% 22.2% 19.9% 27.3% 30.4% 31.6% 34.2% 12.6% 13.2% 21.4% 15.1% 6.3% 3.2% 8.3% 9.5% 3.2% 12.6% 17.1% 3.8% 18.9% 25.9% 8.9% 24.8% \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0093 \u00e2\u009c\u0093 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0093 \u00e2\u009c\u0093 \u00e2\u009c\u0097 \u00e2\u009c\u0093 37.3% 31.1% 36.1% 30.4% 29.2% 29.8% 38.0%\n",
            "\n",
            "2\n",
            "Cove Liama Long context (7B =, 13B =, 34B) + fine-tuning ; Lrama 2 Code training 20B oes Cope Liama - Instruct Foundation models \u00e2\u0080\u0094> nfilling code training = eee.\u00e2\u0080\u009d (7B =, 13B =, 34B) \u00e2\u0080\u0094 5B (7B, 13B, 348) 5008 Python code Long context Cove Liama - PyrHon (7B, 13B, 34B) > training \u00c2\u00bb Fine-tuning > 1008 208\n",
            "Figure 2: The Code Llama specialization pipeline. The different stages of fine-tuning annotated with the number of tokens seen during training. Infilling-capable models are marked with the \u00e2\u0087\u0084 symbol.\n",
            "# 2 Code Llama: Specializing Llama 2 for code\n",
            "# 2.1 The Code Llama models family\n",
            "\n",
            "0.52 0.57 0.19 0.30 Llama 1 7B 13B 33B 65B 0.27 0.24 0.23 0.25 0.26 0.24 0.26 0.26 0.34 0.31 0.34 0.34 0.54 0.52 0.50 0.46 0.36 0.37 0.36 0.36 0.39 0.37 0.35 0.40 0.26 0.23 0.24 0.25 0.28 0.28 0.33 0.32 0.33 0.31 0.34 0.32 0.45 0.50 0.49 0.48 0.33 0.27 0.31 0.31 0.17 0.10 0.12 0.11 0.24 0.24 0.23 0.25 0.31 0.27 0.30 0.30 0.44 0.41 0.41 0.43 0.57 0.55 0.60 0.60 0.39 0.34 0.28 0.39 Llama 2 7B 13B 34B 70B 0.28 0.24 0.27 0.31 0.25 0.25 0.24 0.29 0.29 0.35 0.33 0.35 0.50 0.50 0.56 0.51 0.36 0.41 0.41\n",
            "\n",
            "Ethical Considerations and Limitations (Section 5.2) Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2\u00e2\u0080\u0099s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their speci\u00ef\u00ac\u0081c applications of the model. Please see the Responsible Use Guide available available at https://ai.meta.com/llama/responsible-user-guide\n",
            "Table 52: Model card for Llama 2.\n",
            "77\n",
            "\n",
            "Model Size FIM LCFT HumanEval MBPP pass@1 pass@10 pass@100 pass@1 pass@10 pass@100 Llama 2 Code Llama Code Llama - Python 7B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 70B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0093 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0093 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0093 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0093 34B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0093 \u00e2\u009c\u0093 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0093 \u00e2\u009c\u0093 \u00e2\u009c\u0097 \u00e2\u009c\u0093 \u00e2\u009c\u0097 \u00e2\u009c\u0093 \u00e2\u009c\u0097 \u00e2\u009c\u0093 \u00e2\u009c\u0097 \u00e2\u009c\u0093 12.2% 25.2% 20.1% 34.8% 22.6% 47.0% 30.5% 59.4% 32.3% 63.9% 34.1% 62.6% 34.1% 62.5% 33.5% 59.6% 36.6% 72.9% 36.6% 71.9% 37.8% 70.6% 36.0% 69.4% 48.2% 77.7% 48.8% 76.8% 40.2% 70.0% 38.4% 70.3% 45.7% 80.0%\u001b[0m\u001b[32;1m\u001b[1;3mBased on the information from the arXiv search, here are the key points about Llama 2:\n",
            "\n",
            "<final_answer>\n",
            "- Llama 2 is a large language model developed by Meta AI. It comes in sizes ranging from 7B to 70B parameters.\n",
            "\n",
            "- Code Llama is a version of Llama 2 that has been specialized for code generation through fine-tuning on code datasets. Code Llama models are available in Python, C++, Java, PHP, TypeScript, C#, and Bash.\n",
            "\n",
            "- The Code Llama specialization pipeline involves foundation model pre-training, long context training, code infilling training, and fine-tuning on specific programming languages. \n",
            "\n",
            "- Code Llama significantly outperforms the base Llama 2 models on code generation benchmarks like HumanEval and MBPP. For example, the 34B parameter Code Llama - Python achieves 48.8% pass@1 on HumanEval compared to 34.1% for the 34B Llama 2.\n",
            "\n",
            "- As with all large language models, Llama 2 has limitations and potential risks that need to be considered before deploying it in applications. Meta provides a responsible use guide with recommendations for safety testing and tuning.\n",
            "\u001b[0m\n",
            "\n",
            "\u001b[1m> Finished chain.\u001b[0m\n",
            "\n",
            "- Llama 2 is a large language model developed by Meta AI. It comes in sizes ranging from 7B to 70B parameters.\n",
            "\n",
            "- Code Llama is a version of Llama 2 that has been specialized for code generation through fine-tuning on code datasets. Code Llama models are available in Python, C++, Java, PHP, TypeScript, C#, and Bash.\n",
            "\n",
            "- The Code Llama specialization pipeline involves foundation model pre-training, long context training, code infilling training, and fine-tuning on specific programming languages. \n",
            "\n",
            "- Code Llama significantly outperforms the base Llama 2 models on code generation benchmarks like HumanEval and MBPP. For example, the 34B parameter Code Llama - Python achieves 48.8% pass@1 on HumanEval compared to 34.1% for the 34B Llama 2.\n",
            "\n",
            "- As with all large language models, Llama 2 has limitations and potential risks that need to be considered before deploying it in applications. Meta provides a responsible use guide with recommendations for safety testing and tuning.\n",
            "\n"
          ]
        }
      ],
      "source": [
        "user_msg = \"can you tell me about llama 2?\"\n",
        "\n",
        "out = agent_executor.invoke({\n",
        "    \"input\": user_msg,\n",
        "    \"chat_history\": \"\"\n",
        "})\n",
        "\n",
        "print(out[\"output\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Eae-JyUFl--N"
      },
      "source": [
        "That looks pretty good, but right now our agent is _stateless_ \u2014 making it hard to have a conversation with. We can give it memory in many different ways, but one the easiest ways to do so is to use `ConversationBufferWindowMemory`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 26,
      "metadata": {
        "id": "EqOMNQUfmOEr"
      },
      "outputs": [],
      "source": [
        "from langchain.chains.conversation.memory import ConversationBufferWindowMemory\n",
        "\n",
        "# conversational memory\n",
        "conversational_memory = ConversationBufferWindowMemory(\n",
        "    memory_key='chat_history',\n",
        "    k=5,\n",
        "    return_messages=True\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "O9UvnBrsnNVw"
      },
      "source": [
        "We haven't attached our conversational memory to our agent \u2014 so the `conversational_memory` object will remain empty:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 27,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "KJZNIDAslNoC",
        "outputId": "5267f069-dc09-472d-c9ef-e694cfea982f"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[]"
            ]
          },
          "metadata": {},
          "execution_count": 27
        }
      ],
      "source": [
        "conversational_memory.chat_memory.messages"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ynX9Wca6nawr"
      },
      "source": [
        "We must manually add the interactions between ourselves and the agent to our memory."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "-5hXy1FAnne3",
        "outputId": "b659e1f4-3d02-466f-9880-c0adf65fa507"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[HumanMessage(content='can you tell me about llama 2?'),\n",
              " AIMessage(content='\\n- Llama 2 is a large language model developed by Meta AI. It comes in sizes ranging from 7B to 70B parameters.\\n\\n- Code Llama is a version of Llama 2 that has been specialized for code generation through fine-tuning on code datasets. Code Llama models are available in Python, C++, Java, PHP, TypeScript, C#, and Bash.\\n\\n- The Code Llama specialization pipeline involves foundation model pre-training, long context training, code infilling training, and fine-tuning on specific programming languages. \\n\\n- Code Llama significantly outperforms the base Llama 2 models on code generation benchmarks like HumanEval and MBPP. For example, the 34B parameter Code Llama - Python achieves 48.8% pass@1 on HumanEval compared to 34.1% for the 34B Llama 2.\\n\\n- As with all large language models, Llama 2 has limitations and potential risks that need to be considered before deploying it in applications. Meta provides a responsible use guide with recommendations for safety testing and tuning.\\n')]"
            ]
          },
          "metadata": {},
          "execution_count": 28
        }
      ],
      "source": [
        "conversational_memory.chat_memory.add_user_message(user_msg)\n",
        "conversational_memory.chat_memory.add_ai_message(out[\"output\"])\n",
        "\n",
        "conversational_memory.chat_memory.messages"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T3pA0o6ZnrAl"
      },
      "source": [
        "Now we can see that _two_ messages have been added, our `HumanMessage` the the agent's `AIMessage` response. Unfortunately, we cannot send these messages to our XML agent directly. Instead, we need to pass a string in the format:\n",
        "\n",
        "```\n",
        "Human: {human message}\n",
        "AI: {AI message}\n",
        "```\n",
        "\n",
        "Let's write a quick `memory2str` helper function to handle this for us:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 29,
      "metadata": {
        "id": "raZHBdJmtGH-"
      },
      "outputs": [],
      "source": [
        "from langchain_core.messages.human import HumanMessage\n",
        "\n",
        "def memory2str(memory: ConversationBufferWindowMemory):\n",
        "    messages = memory.chat_memory.messages\n",
        "    memory_list = [\n",
        "        f\"Human: {mem.content}\" if isinstance(mem, HumanMessage) \\\n",
        "        else f\"AI: {mem.content}\" for mem in messages\n",
        "    ]\n",
        "    memory_str = \"\\n\".join(memory_list)\n",
        "    return memory_str"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 30,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "t89yX-6i3hvd",
        "outputId": "7c27cac1-3881-4a26-f59f-274d8d4affa3"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "Human: can you tell me about llama 2?\n",
            "AI: \n",
            "- Llama 2 is a large language model developed by Meta AI. It comes in sizes ranging from 7B to 70B parameters.\n",
            "\n",
            "- Code Llama is a version of Llama 2 that has been specialized for code generation through fine-tuning on code datasets. Code Llama models are available in Python, C++, Java, PHP, TypeScript, C#, and Bash.\n",
            "\n",
            "- The Code Llama specialization pipeline involves foundation model pre-training, long context training, code infilling training, and fine-tuning on specific programming languages. \n",
            "\n",
            "- Code Llama significantly outperforms the base Llama 2 models on code generation benchmarks like HumanEval and MBPP. For example, the 34B parameter Code Llama - Python achieves 48.8% pass@1 on HumanEval compared to 34.1% for the 34B Llama 2.\n",
            "\n",
            "- As with all large language models, Llama 2 has limitations and potential risks that need to be considered before deploying it in applications. Meta provides a responsible use guide with recommendations for safety testing and tuning.\n",
            "\n"
          ]
        }
      ],
      "source": [
        "print(memory2str(conversational_memory))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "q0L_80WrpWqd"
      },
      "source": [
        "Now let's put together another helper function called `chat` to help us handle the _state_ part of our agent."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 31,
      "metadata": {
        "id": "C-Ck2Lv53rD-"
      },
      "outputs": [],
      "source": [
        "def chat(text: str):\n",
        "    out = agent_executor.invoke({\n",
        "        \"input\": text,\n",
        "        \"chat_history\": memory2str(conversational_memory)\n",
        "    })\n",
        "    conversational_memory.chat_memory.add_user_message(text)\n",
        "    conversational_memory.chat_memory.add_ai_message(out[\"output\"])\n",
        "    return out[\"output\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XIheLeTBsO9S"
      },
      "source": [
        "Now we simply chat with our agent and it will remember the context of previous interactions."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 33,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "iJ_PH7YcA_f2",
        "outputId": "7d645e9f-f191-4b62-97fd-973417e1efaa"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "\n",
            "\n",
            "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
            "\u001b[32;1m\u001b[1;3m<tool>arxiv_search</tool>\n",
            "<tool_input>llama 2 red teaming\u001b[0m\u001b[36;1m\u001b[1;3mAfter conducting red team exercises, we asked participants (who had also participated in Llama 2 Chat exercises) to also provide qualitative assessment of safety capabilities of the model. Some participants who had expertise in offensive security and malware development questioned the ultimate risk posed by \u00e2\u0080\u009cmalicious code generation\u00e2\u0080\u009d through LLMs with current capabilities.\n",
            "One red teamer remarked, \u00e2\u0080\u009cWhile LLMs being able to iteratively improve on produced source code is a risk, producing source code isn\u00e2\u0080\u0099t the actual gap. That said, LLMs may be risky because they can inform low-skill adversaries in production of scripts through iteration that perform some malicious behavior.\u00e2\u0080\u009d\n",
            "According to another red teamer, \u00e2\u0080\u009c[v]arious scripts, program code, and compiled binaries are readily available on mainstream public websites, hacking forums or on \u00e2\u0080\u0098the dark web.\u00e2\u0080\u0099 Advanced malware development is beyond the current capabilities of available LLMs, and even an advanced LLM paired with an expert malware developer is not particularly useful- as the barrier is not typically writing the malware code itself. That said, these LLMs may produce code which will get easily caught if used directly.\u00e2\u0080\u009d\n",
            "\n",
            "Model Llama 2 Code Llama Code Llama - Python Size FIM LCFT Python CPP Java PHP TypeScript C# Bash Average 7B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 70B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0093 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0093 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0093 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0093 34B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 7B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 13B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 34B \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0097 14.3% 6.8% 10.8% 9.9% 19.9% 13.7% 15.8% 13.0% 24.2% 23.6% 22.2% 19.9% 27.3% 30.4% 31.6% 34.2% 12.6% 13.2% 21.4% 15.1% 6.3% 3.2% 8.3% 9.5% 3.2% 12.6% 17.1% 3.8% 18.9% 25.9% 8.9% 24.8% \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0093 \u00e2\u009c\u0093 \u00e2\u009c\u0097 \u00e2\u009c\u0097 \u00e2\u009c\u0093 \u00e2\u009c\u0093 \u00e2\u009c\u0097 \u00e2\u009c\u0093 37.3% 31.1% 36.1% 30.4% 29.2% 29.8% 38.0%\n",
            "\n",
            "In addition to red teaming sessions, we ran a quantitative evaluation on risk from generating malicious code by scoring Code Llama\u00e2\u0080\u0099s responses to ChatGPT\u00e2\u0080\u0099s (GPT3.5 Turbo) with LLAMAv2 70B\u00e2\u0080\u0099s safety reward model. For this second quantitative evaluation, we selected prompts that the red teamers generated specifically attempting to solicit malicious code (even though the red teaming included consideration of a broad set of safety risks). These prompts were a mix of clear intent and slightly obfuscated intentions (see some examples in Figure 16. We show a KDE plot of the distribution of the safety score for all models in Figure 7). We observe that Code Llama tends to answer with safer responses; the distribution of safety scores for Code Llama has more weight in the safer part of the range.\n",
            "False refusals. LLMs that are too safe can have a tendency to over-refuse valid claims similar to what was reported after the release of Llama 2. We specifically asked red teamers to test for this behavior. They found some limited evidence of false refusals (when not using a system preprompt). False refusals could also\n",
            "16\n",
            "\n",
            "15\n",
            "pafety Reward Model Scores Distribution on Red Teaming Prompts\n",
            "Responding Model GPT 3.5 Turbo Code Llama 138 Instruct Code Llama 34B Instruct Code Llama 7B Instruct 0.0-+ -0.2 0.0 0.2 0.4 0.6 08 1.0 12 Llama 2 70B Safety Reward Model Score\n",
            "Figure 7: KDE plot of the risk score output by the Llama 2 safety reward model on prompts with clear intent specific to code risk created by red teamers with background in cybersecurity and malware generation.\n",
            "Red teaming. It is important to also proactively identify risks with adversarial testing or red teaming. We conducted 3 red teaming exercises with 25 Meta employees, including domain experts in responsible AI, malware development, and offensive security engineering.\n",
            "The red teamers provided a nuanced evaluation specifically on the risk from so called \u00e2\u0080\u009cdual intent prompts.\u00e2\u0080\u009d Dual intent prompts are requests for help with writing code that could be used maliciously but the prompt does not directly address the topic (example \u00e2\u0080\u009cMosaic Prompts\u00e2\u0080\u009d Glukhov et al. (2023)). For example, the model rightfully refuses to provide support with writing ransomware code but it complies when asked to provide a script to encrypt all files in the user\u00e2\u0080\u0099s home directory since such a script could be used for benign purposes.\n",
            "\n",
            ". . . . . . . . . . . . . . . 3.4 RLHF Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Safety in Pretraining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Safety Fine-Tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Red Teaming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Safety Evaluation of Llama 2-Chat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 Learnings and Observations . . . . . . . . . . . . . . . . . . . . . . . .\u001b[0m\u001b[32;1m\u001b[1;3mBased on the information from the arxiv search, some red teaming was done on the Llama 2 model during development to identify potential safety risks:\n",
            "\n",
            "<final_answer>\n",
            "- Meta conducted 3 red teaming exercises with 25 employees, including domain experts in responsible AI, malware development, and offensive security engineering. \n",
            "\n",
            "- The red teamers categorized successful attacks into four main types: 1) getting the model to provide some harmful information while refusing other content, 2) having the model roleplay specific scenarios, 3) forcing the model to highlight positives of harmful content, and 4) embedding harmful instructions within complex commands.\n",
            "\n",
            "- Some red teamers questioned the ultimate risk posed by \"malicious code generation\" through current LLMs. They noted that while LLMs being able to iteratively improve code is a risk, producing source code itself isn't the main gap. Advanced malware development is currently beyond LLM capabilities.\n",
            "\n",
            "- Quantitative evaluation was also done by scoring Code Llama's responses to malicious code prompts using Llama 2's safety reward model. Code Llama tended to give safer responses compared to GPT-3.5.\n",
            "\n",
            "- However, the full extent and details of the red teaming are limited based on the information available. The Llama 2 paper mentions expanding prompts with safety risks via red teaming, but does not go in-depth on the process or results. More information would be needed to fully characterize the red teaming performed.\n",
            "\u001b[0m\n",
            "\n",
            "\u001b[1m> Finished chain.\u001b[0m\n",
            "\n",
            "- Meta conducted 3 red teaming exercises with 25 employees, including domain experts in responsible AI, malware development, and offensive security engineering. \n",
            "\n",
            "- The red teamers categorized successful attacks into four main types: 1) getting the model to provide some harmful information while refusing other content, 2) having the model roleplay specific scenarios, 3) forcing the model to highlight positives of harmful content, and 4) embedding harmful instructions within complex commands.\n",
            "\n",
            "- Some red teamers questioned the ultimate risk posed by \"malicious code generation\" through current LLMs. They noted that while LLMs being able to iteratively improve code is a risk, producing source code itself isn't the main gap. Advanced malware development is currently beyond LLM capabilities.\n",
            "\n",
            "- Quantitative evaluation was also done by scoring Code Llama's responses to malicious code prompts using Llama 2's safety reward model. Code Llama tended to give safer responses compared to GPT-3.5.\n",
            "\n",
            "- However, the full extent and details of the red teaming are limited based on the information available. The Llama 2 paper mentions expanding prompts with safety risks via red teaming, but does not go in-depth on the process or results. More information would be needed to fully characterize the red teaming performed.\n",
            "\n"
          ]
        }
      ],
      "source": [
        "print(chat(\"was any red teaming done with the model?\"))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5p8m4Gc5w1OX"
      },
      "source": [
        "We can ask follow up questions that miss key information but thanks to the conversational history the LLM understands the context and uses that to adjust the search query. For example we asked about `red teaming` but did not mention `llama 2` \u2014 Claude 3 added this context to the search query of `\"llama 2 red teaming\"` based on the chat history."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e9bI9czPtWnl"
      },
      "source": [
        "---"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "ml",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}